Skip to content

fix(query): clarify condition resolution semantics for label queries#2994

Open
contrueCT wants to merge 18 commits into
apache:masterfrom
contrueCT:task/improve-condition-query-semantics
Open

fix(query): clarify condition resolution semantics for label queries#2994
contrueCT wants to merge 18 commits into
apache:masterfrom
contrueCT:task/improve-condition-query-semantics

Conversation

@contrueCT

Copy link
Copy Markdown
Contributor

Purpose of the PR

ConditionQuery.condition() currently mixes several different meanings in one API, including:

  • no condition
  • conflicting conditions resolved to empty
  • a unique resolved value
  • a raw multi-value result
  • an exception for ambiguous resolved values

This PR keeps the legacy condition() behavior unchanged, adds explicit condition-resolution APIs, and migrates the high-risk LABEL call sites to use the clearer semantics.

Main Changes

  • Add explicit condition-resolution APIs to ConditionQuery
    • containsCondition(Object key)
    • conditionValues(Object key)
    • conditionValue(Object key)
  • Keep the legacy condition() method backward-compatible
  • Document the semantic differences between the legacy API and the new explicit APIs
  • Migrate LABEL-related high-risk callers to the new APIs in:
    • graph/index transactions
    • serializers
    • traversers
    • in-memory / hstore paths
  • Preserve the old behavior for non-LABEL legacy usages in this first step

Verifying these changes

Added and extended regression coverage for the new semantics:

  • QueryTest#testConditionWithoutLabel
  • QueryTest#testConditionWithEqAndIn
  • QueryTest#testConditionWithSingleInValues
  • QueryTest#testConditionWithConflictingEqAndIn
  • QueryTest#testConditionWithMultipleMatchedInValues

Added a targeted regression for the label-index fallback path:

  • VertexCoreTest#testCollectMatchedIndexesByJointLabelsWithIndexedProperties

This test verifies:

  • a multi-label query can conservatively fall back and still match the indexed label
  • conflicting label conditions produce no matched indexes

Existing label-query regressions were also rechecked to ensure no behavior regression:

  • EdgeCoreTest#testQueryInEdgesOfVertexByLabels
  • EdgeCoreTest#testQueryInEdgesOfVertexByConflictingLabels
  • EdgeCoreTest#testQueryInEdgesOfVertexBySortkey
  • VertexCoreTest#testQueryByJointLabels
  • Trivial rework / code cleanup without any test coverage. (No Need)
  • Already covered by existing tests, such as (please modify tests here).
  • Need tests and can be verified as follows:
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='QueryTest#testConditionWithoutLabel+testConditionWithEqAndIn+testConditionWithSingleInValues+testConditionWithConflictingEqAndIn+testConditionWithMultipleMatchedInValues' test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='EdgeCoreTest#testQueryInEdgesOfVertexByLabels+testQueryInEdgesOfVertexByConflictingLabels+testQueryInEdgesOfVertexBySortkey' test
    • mvn -pl hugegraph-server/hugegraph-test -am -P core-test,memory -DfailIfNoTests=false -Dtest='VertexCoreTest#testQueryByJointLabels+testCollectMatchedIndexesByJointLabelsWithIndexedProperties' test

Does this PR potentially affect the following parts?

Documentation Status

  • Doc - TODO
  • Doc - Done
  • Doc - No Need

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 13, 2026
@contrueCT contrueCT changed the title improve(query): clarify condition resolution semantics for label queries fix(query): clarify condition resolution semantics for label queries Apr 19, 2026
@codecov

codecov Bot commented May 6, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.00000% with 30 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.32%. Comparing base (c3f56b5) to head (18fe87f).

Files with missing lines Patch % Lines
...he/hugegraph/backend/tx/GraphIndexTransaction.java 52.38% 16 Missing and 4 partials ⚠️
...apache/hugegraph/backend/query/ConditionQuery.java 95.45% 0 Missing and 3 partials ⚠️
...g/apache/hugegraph/backend/store/ram/RamTable.java 0.00% 2 Missing ⚠️
.../apache/hugegraph/backend/tx/GraphTransaction.java 81.81% 0 Missing and 2 partials ⚠️
...hugegraph/backend/serializer/BinarySerializer.java 80.00% 1 Missing ⚠️
...e/hugegraph/backend/serializer/TextSerializer.java 80.00% 1 Missing ⚠️
...e/hugegraph/traversal/algorithm/HugeTraverser.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #2994      +/-   ##
============================================
- Coverage     36.10%   35.32%   -0.79%     
- Complexity      338      499     +161     
============================================
  Files           803      803              
  Lines         68291    68381      +90     
  Branches       8970     8992      +22     
============================================
- Hits          24656    24154     -502     
- Misses        40990    41604     +614     
+ Partials       2645     2623      -22     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch 2 times, most recently from 4c42786 to cc9af24 Compare May 26, 2026 12:29

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one correctness issue in the latest revision. The CI failures were posted separately as a PR-level reminder.

CI/status checks are failing on the latest head (cc9af24929e42af1c90e1f55f3e60adc351e0318). Could you check the failed jobs before the next review round?

Failed checks include:

@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from cc9af24 to 2e82f83 Compare May 30, 2026 10:20

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a clear blocking correctness issue in the latest head, and the previous LABEL-resolution comments look addressed. One remaining merge risk is that the latest checks are still red: hstore failed in VertexCoreTest#testQueryByDateProperty.

Since this PR also touches HstoreStore, could you rerun or clarify whether the hstore failure is an existing flaky/environment issue?

contrueCT added 5 commits June 5, 2026 12:59
Add explicit condition resolution APIs to ConditionQuery while preserving the legacy condition() behavior. Introduce containsCondition(Object), conditionValues(Object), and conditionValue(Object) so callers can distinguish missing, empty, unique, and multi-value results without overloading null semantics.

Migrate LABEL-specific consumers in graph/index transactions, serializers, traversers, and stores to use the new APIs for unique-label resolution and conservative fallback behavior. Extend QueryTest and VertexCoreTest to cover absent, conflicting, and multi-value label conditions as well as collectMatchedIndexes() behavior for multi-label and conflicting label queries.
@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from 94408b7 to b10e3c2 Compare June 5, 2026 05:10
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Jun 5, 2026
@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from 801923a to ebc31c8 Compare June 5, 2026 18:06
@contrueCT

contrueCT commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for your patience. The hstore CI failure exposed an existing latent issue in hstore's range-index query path. For range-index scans with limit/paging, the upper layer assumed that backend scan results were globally ordered by the range-index key and that the returned page state could be reused as a HugeGraph range cursor. In hstore, multi-node/tablet scans can return entries in backend iterator order, and the page state is an internal storage cursor, so those assumptions may lead to unstable ordering or skipped results. This PR keeps the fix intentionally scoped: hstore range-index queries whose visible result depends on limit/offset/paging are sorted and sliced in the index layer, while unbounded scans still use the original streaming path to avoid disturbing count, joint-index, and cleanup paths. I think this is enough for the current PR, but the underlying hstore scan/page-state contract should be handled in a dedicated follow-up, ideally by defining whether range scans must be globally ordered and fixing the hstore iterator/page-state semantics at the storage-client layer.

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: HStore range-index offset queries can skip too many sorted results. Evidence: static review of GraphIndexTransaction/query offset handling.

@contrueCT

Copy link
Copy Markdown
Contributor Author

Thanks. I fixed this by resetting scanQuery.offset(0L) before the full sorted range-index scan, so the fallback now reads the complete matched range first and lets the original query apply offset/limit only once after sorting. I also added range-offset coverage to VertexCoreTest#testQueryByDateProperty to guard the double-skip case. Local checks passed with git diff --check, hugegraph-core compile, and VertexCoreTest#testQueryByDateProperty under the rocksdb core-test profile.

@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch 10 times, most recently from 34f1fb1 to 23d4f94 Compare June 15, 2026 04:32
@contrueCT contrueCT force-pushed the task/improve-condition-query-semantics branch from 23d4f94 to 9e24432 Compare June 15, 2026 05:03

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: yes. Summary: HStore range-index ordering is still bypassed for finite non-paging and large-page scans. Evidence: static review plus local unit/store-client tests.


🔗 Latest-head CI/status still has visible failures; please check hstore, codecov/patch, and codecov/project: https://github.com/apache/hugegraph/actions/runs/27525154399/job/81350967135

type, null, position);
}

static boolean shouldUseOrderedRangeScan(IdRangeQuery query) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Keep ordered range scans on every order-sensitive HStore path

Evidence: GraphIndexTransaction.keepBackendIndexOrder() marks HStore range indexes as order-sensitive whenever the original query has a finite limit/offset, but this gate only enables the new globally ordered merge for paging && limit <= net.kv.scanner.page.size. The fallback below still calls session.scan(...), which reaches the legacy mergeRangeScanIterators() path in NodeTxSessionProxy; its existing test asserts partition/top-work order like 3,1,4,2, not global range-key order.

Impact: multi-partition HStore range-index queries can still return the wrong first n ids, skip the wrong ids for range()/offset, or page incorrectly when the requested page size is above the scanner page size. Please route all order-sensitive HStore range-index scans through the ordered merge, or make the graph layer fall back to a correctness-preserving full ordered scan before applying offset/limit.

imbajin

This comment was marked as outdated.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 16, 2026

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes because the current HStore range-index fallback can turn bounded paged queries into full-range materialization in the graph layer. Please either move this to a bounded HStore ordered-merge path, or remove the fallback from this PR and leave it as a dedicated follow-up.

ConditionQuery scanQuery = query.copy();
scanQuery.page(null);
scanQuery.offset(0L);
scanQuery.limit(Query.NO_LIMIT);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

‼️ Avoid full-range materialization for bounded HStore range-index queries

This path is still risky because it preserves correctness by clearing the backend bounds and materializing the whole matched range before applying limit / offset / page in the graph layer.

Some Gremlin examples that can trigger the issue on HStore range indexes:

g.V().hasLabel('software').has('price', P.between(1, 1000000)).limit(10)
g.V().hasLabel('person').has('birth', P.between(start, end)).range(20, 40)
g.V().hasLabel('event').has('timestamp', P.gte(dayStart)).has('~page', '').limit(20)

The user asks for a bounded page, but this fallback can behave like:

range-index query + limit/page
        |
        v
clear page + offset + limit
        |
        v
scan the full matched HStore range
        |
        v
sort everything in memory
        |
        v
slice the requested page

On a large HStore deployment, a query that appears to request only 10 or 20 rows can scan and sort a very large range under index-label read locks. That is a significant hidden performance risk and does not match the expected behavior of paged range queries.

I think the preferred fix is option B: keep the ordering fix in the HStore/store-client layer with a bounded ordered merge of partition iterators, then stop as soon as enough rows are collected for offset + limit or the requested page size:

partition iterator A
partition iterator B
partition iterator C
        \ | /
   ordered k-way merge
        |
        v
stop after the requested page is satisfied

If that is too large for this PR, please consider removing this graph-layer materialized fallback from the current PR, leave a clear FIXME near the HStore range scan path, and handle globally ordered HStore range-index paging in a separate follow-up. I don't think we should merge a correctness fix that silently turns bounded range-index queries into potential full-range scans.

@dosubot dosubot Bot removed the lgtm This PR has been approved by a maintainer label Jun 16, 2026
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Jun 16, 2026

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blocking: no. Summary: No obvious issues found in the current head. Evidence: git diff --check; QueryTest/QueryResultsTest 12 tests passed; VertexCoreTest/EdgeCoreTest 438 tests passed; HstoreTableTest 2 tests passed; latest GitHub checks passed.

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label Jun 16, 2026
@imbajin

imbajin commented Jun 23, 2026

Copy link
Copy Markdown
Member

I took another look at the latest head and I don't think this should be treated as a generic pd-store timeout.

Current head: 18fe87f3c9a4e1a455340944ab5c66f175ce7f65

Current PR state:

reviewDecision:   APPROVED
mergeable:        MERGEABLE
mergeStateStatus: UNSTABLE

HugeGraph-PD & Store & Hstore CI
├─ struct: SUCCESS
├─ pd:     SUCCESS  06:00:35 -> 06:09:30
├─ store:  SUCCESS  06:00:29 -> 06:14:10
└─ hstore: CANCELLED 06:00:35 -> 12:00:50
   └─ Run core test: 06:05:23 -> 12:00:48

Other visible red check:
└─ codecov/project: FAILURE

So the failing GitHub Actions job is specifically hstore in HugeGraph-PD & Store & Hstore CI. pd and store passed on this run.

Why I don't think this is just a slow runner:

Recent successful HStore runs

run 27693769986
└─ hstore total:     about 24m26s
   Run core test:   about 10m21s

run 27893737780
└─ hstore total:     about 24m41s
   Run core test:   about 10m50s

run 28014295081
└─ hstore total:     about 24m26s
   Run core test:   about 10m29s

PR #2994 latest run 27739968315
└─ hstore total:     about 6h, then cancelled
   Run core test:   about 5h55m, then cancelled

The job was not stuck in dependency download, compile, or startup. It entered Run core test and then ran until GitHub cancelled the job at the 6-hour boundary.

The most relevant log evidence I found is the last useful stack before cancellation:

VertexCoreTest.testQueryInPageWithoutCapacity
  -> IteratorUtils.count(...)
  -> PageEntryIterator.hasNext/fetch
  -> QueryList.fetchNext
  -> GraphIndexTransaction.doIndexQueryOnce
  -> HStore iterator / KvPageScanner close

There is also a Netty ByteBuf.release() leak report around that path, but I would treat that as a symptom exposed during iterator close/cleanup, not as the primary explanation for why the core test never finishes.

The suspicious overlap with this PR is high:

PR changes
├─ QueryList.java
│  └─ paged index result fetch now keeps input order
├─ GraphIndexTransaction.java
│  └─ index scan filtering / matched-index read path changed
├─ HstoreTable.java
│  └─ adds RangeIndexEntryIterator for range-index paging
└─ VertexCoreTest.java
   └─ affected paging/index test area

Failed runtime path
└─ hstore core test
   └─ VertexCoreTest.testQueryInPageWithoutCapacity
      └─ paged index query
         └─ QueryList / GraphIndexTransaction / HStore iterator

My current hypothesis is that the HStore range-index paging cursor/page-state is not advancing correctly for this path. PageEntryIterator depends on the page returned by the lower layer to move forward. If the HStore range-index iterator returns a page state that points back to the same range/entry, the outer paging loop can repeatedly fetch the same page/range and keep the core test running until the CI timeout.

Conceptually:

g.V().has("age", 30).has("~page", "").limit(-1)
        |
        v
GraphIndexTransaction.doIndexQueryOnce()
        |
        v
HStore range-index scan
        |
        v
RangeIndexEntryIterator.pageState()
        |
        v
PageEntryIterator.fetch()
        |
        +-- if page state does not move monotonically
            fetch repeats / iterator cleanup repeats / test never completes

Given that this PR directly changes the HStore/query paging area, I don't think a rerun alone is enough evidence unless the same HStore range-index paging path is proven stable. I would not merge this head as-is. Please fix or revert the HStore range-index paging part, or move that piece into a dedicated follow-up where the storage-client page-state contract can be handled explicitly.

At minimum, I think we need a targeted regression check around:

HStore + range/index query + paging + no capacity / large limit

The Codecov project failure is still visible too, but the hstore timeout is the more important blocker because it overlaps with the changed behavior rather than looking like an unrelated external check.

@imbajin imbajin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes for now because the previous head exposed a blocking HStore CI issue and the current head has not yet proven that path is fixed.

The detailed analysis is here: #2994 (comment)

Important context: the PR head has moved to 057b390380552ec32494639bc7f7edcac08f043d, and its checks are still queued/in progress. I don't want to over-claim that the new head still reproduces the same timeout, but the previous failure was not a generic pd-store timeout: pd and store passed, while hstore hung in Run core test for almost 6 hours and the log path overlapped with this PR's HStore/index paging changes.

Please either fix/revert the HStore range-index paging path or wait for the new head to show that hstore passes cleanly with the relevant regression coverage.

@dosubot dosubot Bot removed the lgtm This PR has been approved by a maintainer label Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improve]: clarify ConditionQuery.condition() semantics for missing, conflicting, and multi-value conditions

3 participants